On the prediction loss of the lasso in the partially labeled setting
نویسندگان
چکیده
In this paper we revisit the risk bounds of the lasso estimator in the context of transductive and semi-supervised learning. In other terms, the setting under consideration is that of regression with random design under partial labeling. The main goal is to obtain user-friendly bounds on the off-sample prediction risk. To this end, the simple setting of bounded response variable and bounded (high-dimensional) covariates is considered. We propose some new adaptations of the lasso to these settings and establish oracle inequalities both in expectation and in deviation. These results provide non-asymptotic upper bounds on the risk that highlight the interplay between the bias due to the mis-specification of the linear model, the bias due to the approximate sparsity and the variance. They also demonstrate that the presence of a large number of unlabeled features may have significant positive impact in the situations where the restricted eigenvalue of the design matrix vanishes or is very small. MSC 2010 subject classifications: Primary 62H30; secondary 62G08.
منابع مشابه
Differenced-Based Double Shrinking in Partial Linear Models
Partial linear model is very flexible when the relation between the covariates and responses, either parametric and nonparametric. However, estimation of the regression coefficients is challenging since one must also estimate the nonparametric component simultaneously. As a remedy, the differencing approach, to eliminate the nonparametric component and estimate the regression coefficients, can ...
متن کاملMammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease
Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...
متن کاملImputation of parent-offspring trios and their effect on accuracy of genomic prediction using Bayesian method
The objective of this study was to evaluate the imputation accuracy of parent-offspring trios under different scenarios. By using simulated datasets, the performance Bayesian LASSO in genomic prediction was also examined. The genome consisted of 5 chromosomes and each chromosome was set as 1 Morgan length. The number of SNPs per chromosome was 10000. One hundred QTLs were randomly distributed a...
متن کاملPrediction of Self-Care Behaviors Based on Perceived Stress and Goal Setting Skill in Patients with Type 2 Diabetes
Introduction: Diabetes is one of the common chronic diseases in modern societies and self-care behaviors consider as one of the most important factors in controlling this disease. Regarding the impact of psychological factors on adherence to self-care behaviors, the current study aimed to predict self-care behaviors, based on goal setting skill and perceived stress of patients with type 2 diabe...
متن کاملDevelopment of Lifetime Prediction Model of Lithium-Ion Battery Based on Minimizing Prediction Errors of Cycling and Operational Time Degradation Using Genetic Algorithm
Accurate lifetime prediction of lithium-ion batteries is a great challenge for the researchers and engineers involved in battery applications in electric vehicles and satellites. In this study, a semi-empirical model is introduced to predict the capacity loss of lithium-ion batteries as a function of charge and discharge cycles, operational time, and temperature. The model parameters are obtai...
متن کامل